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Abstract 

This paper provides a semiparametric model to estimate processes of the volatility 
defined as the squared diffusion coefficient of a stochastic differential equation. With- 
out assuming any functional form of the volatility function, we estimate the volatility 
process by filtering. We prove the consistency of the model in the sense that estimated 
processes converge to the true ones as the number of observations (iV) goes to infinity 
and the sampling time interval {At) goes to zero while NAt going to infinity. We 
also carry out numerical experiments through stochastic differential equations with 
linear/nonlinear volatility functions in order to check whether or not the model can 
actually estimate the volatility and compare the performance with the local linear 
model. 

Keywords: Diffusion process; Function estimation; Polynomial approximation; Spot 
volatility; State estimation; State space model. 

1 Introduction 

When modeling time series by continuous-time stochastic processes, we often face a difficult 
problem of what kind of functions should be used for the drift and diffusion coefficients of a 
stochastic differential equation since we have almost no knowledge about them beforehand. 
But, the specihcation of the diffusion coefficient is much more important for the modeling. 
Actually, recent researches about analysis of financial time series show the weak evidence of 
nonlinearity in the drift suggested by Stanton (1997) for example; Chapman and Peason 
(2000) addresses that the test of the nonlinearity is not robust through the simulation 
studies. And, Fan and Zhang (2003) develops an alternative test free from the problem of 
the method used by Stanton (1997) and show the weak evidence against the linear drift 
of Standard & Poor 500 as well as the short-term interest rate. Furthermore, Sun (2003) 
and recently Bali and Wu (2006) report the similar results. Additionally, from a technical 
point of views as pointed out by Bandi and Phillips (2003), the drift coefficient cannot be 
identified nonparametrically on a fixed time interval. 

To the contrary, those researches stress the nonlinearity in the diffusion coefficient, or 
the volatility, which is crucial for describing the time evolution of financial time series such 
as interest rate data. And besides, there is no such a technical problem of identification 
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as estimating the drift coefficient on a fixed interval. So, the specification of the volatihty 
is really important for their modeling. 

In the recent statistical models of volatility, the realized volatility is becoming one of the 
most successful tools in modeling and forecasting the volatility, and its more extending 
studies have been extensively carried out recently; Thomakos et al (2002), Deo et al 
(2006), Engle and Gallo (2006), and Ghysels et al (2006), for example. The realized 
volatility is basically rooted in the fact that the quadratic variation on a time interval 
converges in probability to the volatility over that time interval, so-called the integrated 
volatility. Theoretical and numerical studies such as Andersen et al (2003, 2004, 2005) 
and Barndorff-Nielsen and Shephard (2002, 2004) that are relevant to stochastic volatility 
model show that the estimation by the realized volatility is well performed. Though the 
realized volatility can effectively estimate the integrated volatility, it's still difficult to 
estimate the spot volatility, or the squared diffusion coefficient, that is defined as the 
integrand of the integrated volatility. This information is indispensable for setting up a 
stochastic differential equation and using it for practical purposes as well. And besides, 
the integrated volatility easily recovers from the spot volatility, but it's not easy to do the 
converse. To estimate the spot volatility, we usually need to have some information about 
its functional form beforehand. But, this is not expected because of little knowledge about 
the functional form of the spot volatility. 

The aim of the paper is to present a method of estimating the spot volatility, sim- 
ply called the volatility in the paper, of a one-dimensional stochastic differential equation 
from discrete observations. But, since we have no knowledge about what kind of func- 
tions should be used for the volatility, we need to model it nonparametrically. The most 
straightforward way is to use its polynomial approximation. But this approach doesn't 
seem successful since estimation of a polynomial function is not efficient particularly when 
a higher order polynomial is used. Conversely, however, we can't use a lower order poly- 
nomial since it leads to bad approximation of the diffusion coefficient after all. 

The drawbacks of this approach lie in fitting a polynomial globally. Hence, we could 
use local polynomial modeling as alternatives. This modeling is based on the kernel 
regression in which the regression function is expressed as the weighted average of sev- 
eral sub-regression functions that are usually first or second order polynomials and these 
weighted are characterized by the so-called kernel function. See Fan and Gijbels (1996) 
and Campbell et al (1997), for example. Actually, the kernel regression, more specifically 
the local polynomial model, is used for estimating the volatility in a nonparametric manner 
from Florens-Zmirou (1993) to Stanton (1997), Fan and Yao (1998), Jacod (2000), Bandi 
and Phillips (2003) and Fan and Zhang (2003), which are given as fully nonparametric 
models while Ai't-Sahalia (1996) proposes a semiparametric model in which the functional 
form of the drift coefficient is known. Though the local polynomial modeling doesn't suffer 
from the trouble of higher order, as pointed out in Campbell et al (1997), it instead has 
the problem of overfitting and bandwidth selection. In particular the overfitting is serious 
in forecasting the volatility. 

To avoid these intractability, we reconsider the local polynomial modeling from a dif- 
ferent point of views. In the local polynomial modeling, though it is considered as a 
nonparametric model, each polynomial over its window needs to be estimated parametri- 
cally. But, this paper proposes a model of estimating processes of the volatility function at 
observed states of the process without estimating its parametric functional form. Simply 
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stated, every unobservable process of the volatility is constructed out of the observable 
ones. Or intuitively, the model looks like a local polynomial model with infinitesimal 
bandwidth. And then, we try to get a one-to-one correspondence between the observable 
and unobservable processes and draw them on the plane, which will produce information 
on the functional form of the volatility. 

This method depends solely on how unobservable processes should be estimated from 
observable ones. The state space modeling is one of the most popular methods for that 
purpose so that every unobservable state can be easily estimated from observable states 
thanks to the Kalman filtering. So it seems we have only to set up a state space model 
in which we define states of the volatility as unobservable ones. Although the method 
is surely straightforward, we can't directly apply the updating formula of the Kalman 
filtering to the problem under consideration since a stochastic differential equation cannot 
be necessarily handled by its simple application. In this paper, we propose an alternative 
recursive updating formula, and thereby we get estimates of the volatility as filtered states. 
Thanks to the recursive updating, the prediction and filtering can reflect the recent state 
of the process so that we don't have to care much about the problem of the overfitting. 
And besides, no bandwidth selection is required. 

From a theoretical viewpoint, it is quite important whether or not the proposed model 
has the consistency in the sense that estimated processes converge to the true ones as 
sampling time interval goes to zero for example. We present a proof of the consistency 
so that theoretically we can estimate true processes as accurately as we might expect 
by making the sampling time interval close to zero while making the total time span as 
large as possible. On the other hand, from a practical point of views, it is important as 
well whether wc can feasibly implement the model or not. By using stochastic differential 
equations with linear /nonlinear volatilities, we carry out numerical experiments to see how 
well we can estimate volatility functions from discretely observed data. And, we compare 
the performance of the proposed model with the local linear model which is used as one of 
the local polynomial models. Additionally, we estimate the integrated volatilities by using 
the estimated volatility processes and compare them with those estimated by the realized 
volatility. 

The organization of this paper is as follows. Firstly we propose a model by which 
unobservable processes of the volatility function can be estimated from discrete time series 
of the process of interest. Secondly, we discuss the consistency of the model by investigating 
the asymptotic behavior of estimated processes. And then, we conduct Monte Carlo 
experiments to evaluate the performance of the model through comparison with the local 
linear model. Last, we give the concluding remarks. 

2 Semiparametric Model 

We consider a one-dimensional diffusion process, Xt, which never explodes in finite time 
and satisfies the following stochastic differential equation (SDE) starting at a constant 
C>0, 

dXt = ij{Xt;v)dt + (j{Xt)dBu (1) 

where ji{x\ri) is a linear /nonlinear function which is twice continuously differentiable with 
respect to x and r] and {Bt,Tt}t>o is a standard Brownian motion on filtration {Tt}t>o- 
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We define the volatility function g(x) by g{x) = cr{x)^ and assume /q g{Xu)du < oo for any 
t < oo almost surely. As in the same setting as Ait-Sahalia (1996), we assume iJ,{x;ri) is 
known with an unknown parameter vector 77 but g{x) is completely unknown. Instead, we 
could assume fi is completely unknown as well if wc have a method to estimate consistently 
/X and fj,' with its rate of convergence (NAt)~^ , where N is the number of observations 
and At is a sampling interval. But, throughout this paper, we assume //(x; r]) is known 
while 77 is unknown. 

Suppose equidistant discrete times expressed byO = to < < ■■" < tjsf = T with 
At = T/N. We observe the process Xf at the discrete times, {Xti^}i<k<N- Under this 
setting, we want to estimate the discrete states {giXti^)}i<k<N from {Xtf,}i<k<N- 

First, suppose an approximation of g, denoted by /, given as the second order Taylor 
expansion around xq: 

fix) ^ gixo) + g'ixo)ix - xq) + ^^{x - xof (2) 

By replacing x by the process Xt, we can approximate g{Xt) as a quadratic function of 
Xt- Here assuming xq to be fixed globally, the approximation leads to a global polynomial 
approximation. Instead, replacing xq by Xg which changes depending upon choice of s, 
we get a local polynomial approximation. In the local polynomial approximation, the 
coefficients such as g' and g" are constant over where s = tk-i, but not globally. 

Hence, even if g{x) is actually a cubic function for example, it could be well approximated 
by the local polynomial model of degree two just as a smooth curve can be approximated 
piecewisely by tangent lines. By contrast, the global polynomial model frequently comes 
to bad approximation particularly when g shows high nonlinearity. 

In the local polynomial approximation, we define new processes, Yt, Y"/, and Y^, by 

Yt = f{Xt) 
Yl ^ f'iXt) 

Yt' ^ nxt). 

In order to see how these processes evolve in time, we apply the Ito's formula to Yt, Yt, 
and Yt on tk-i < s < t < tk, and thereby we get. 



Yt-Y, = jy^dXu + f^\Y^d{X)u 



Yt^-Y} = fY^dXu 



Yt'-Y^ = 0. 



The last equality implies Y-j^ is constant over But, we proceed as if Y^ to be 

globally constant and denote it by 6 in place of Y^. Using this, we rewrite the above 
system in a differential form as follows: 

dYt = Yt^dXt + ^d{X)t (3) 
dYt^ = edXt (4) 
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under the original SDE ([T|). Here we set Yg = g{Xs) at every s E {tk}o<k<N- Hence, 
every sample path of Yt is not necessarily continuous at {tk}o<k<N, but is continuous over 
[tk-i,tk) for all A; (1 < A; < N). 

Combining ([T]), ^ and we can setup a system of SDEs. On the assumption, we 
can observe Xt but not Yj. So, we have to estimate Ij by the technique of filtering or 
something like that. The system, however, is not so tractable for the purpose as the linear 
system which can produce an estimate of such an unobservable process as Yt through the 
Kalman-Bucy filtering for example. So, we want another system as an approximation of 
the system ([T]), ([3]) and (gl). 

First, we consider a linear approximation of /i in ([T]) around xq, denoting by r]) as 
follows: 

fl{x;r]) = fi{xo;r]) + fi'{xo;r]){x - xo) 

= fJ-{xo;v) - f^'{xo;rj)xo + fi'{xo;r])x. 

Replace x, xq and rj by Xt, Xg and some estimate of r], r), respectively; fj will be replaced 
later by a least squar estimate. And, denote ^{Xs^ff) — n' {Xs;fi)Xs and fi'{Xs] 'f))Xt by ag 
and f5s for simplicity. Similarly, we replace Yt in ([3]) by Yg like the Euler method. And 
then, to link the observable process and the volatility process as an unobservable one, we 
define Xt and Yt, as approximation of Xt and It, which satisfy the following system of 
SDEs: 

dXt = {as + (3sXt)dt + ^JftdBt (5) 

dYt = Y}dXt+'^-d{X)t (6) 

dY^ = edXt (7) 

for t € (1 < A; < n). And, we take Xt^_-^ = Xt^._^ at the end-point. That is, like 

Yt, we reset the initial state of the approximate observable process to that of the original 
one at discrete times. As for Yt^_^, it's recursively defined. Initially Yq = Yq. And then, 
we define it by Ytj,_^ = limg^^^^ Yg. Thereby Yt is a continuous process. 

Here note the difference between Yt and Yt as well as Xt and Xt- And besides, Y^ 
is the same between the two systems. Firstly, Xt and Yt evolve in time according to ([I]) 
and ([3]), whereas Xt and Yj do according to ([5]) and ([6]), respectively. So, Xt is continuous 
while Xt is not necessarily continuous. Conversely, Yt is not necessarily continuous while 
Yt is continuous. But, Yt is driven by Xt for the both cases. Actually, Yt is immediately 
given by Y^ = Y} +e{Xt- Xs),oi equivalently, Y^ = Y^ + ^(Xt - Xq). 

Though the system ([S])-® looks like a stochastic volatility model, it differs since the 
system is derived from stochastic differential equations with time-homogeneous drift and 
diffusion coefficients. Differently from stochastic volatility models, the system (IS])-® is 
tractable since the drift coefficients are locally linear in X and Y . Hence, the system of 
SDE's can be explicitly solved on Tg-, and their conditional expectations with respect to 
J^s are easily obtained. To this end, we rewrite the system compactly as follows: 

dxt = (^xt + h)dt + S{^t)dBt (8) 
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where, 



xt = iXt,Yty 

( Ps 



A 



as 



1 , 5(Xi) 



For tk-i < s < t < tk and 6^0, its solution is given as, 

xj = exp(^(t-s))xs + ^~^(exp(^(t-s))-/)b 
+ ^ exp(^(t - u))<S(x„)dS„ 



(9) 



where. 



exp(Ai) 



c— a ^ ' 



— ) 

c— a y 



with a = /3s, = /J^F/, and c = 6/2 and / is an identity matrix. Hence, the conditional 
mean and variance on J^g, denoted by -Esfxj] and covs(xj) respectively, are given as, 



Es[jct] = exp(A(t - s))xs +^"^(cxp(yl(t - s)) - /)b 



COVs(xt) = E 



f exp(^(t - ■u))5(x„)5(x„)' exp(A(t - u))'du 

J s 



(10) 
(11) 



exp(yl(t - u))E[S{->Lu)S{yLu)'\J^s\ exp(A(i - u))' du 



Here note that the conditional mean of xj is linear in x., and all the components of A and 
b are characterized by the local/global constants, a^, 0, and Y}. Since 



E[5(x„)5(x„)'|J^,] = E[Yu\rsV ^ 



l^2 



covs(xt) can be further computed by using the formula of £^s[x„]. After somewhat cum- 
bersome computation, we get. 



COVs(xt) = 



h Ph + {q + Y})h 

ph + (g + Y})h P^h + 2p{q + Y})h + {q + Y}fh 



(12) 



where. 



h 
h 
h 



g(2a-c)At _ -y g2aAt _ j 



yie! 



aAt 



a 

^cAt _ 1 



3^3- 



cAt 



2a — c 2a 

^aAt g(a+c)At ^ 



yie 



pXs 



aAt 



g(2c-o)At _ -|^ 



2c — a 



+ 3^26 



cAt 



gcAt _ 1 



a + c 

+ 3^3 



g2cAt _ 1 



2c 



ac 
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3^2 = qXs + Ys + ^^ 
c 

bas OsY^ 

J/g _ 

ac c 

b ( c 

P = - 1 

a \ c — 

Since 3^2 is linear in Yg, covs(xt) is a linear function of Yg and denoted by V^g(Ys) for 
simplicity. 

These expectations are not immediately used for the estimate of Yt that is considered 
as an approximation of gt, defined by gt = g{Xt), since the expectations are conditional on 
J-g- But, we want to estimate it on the condition of the discrete observations. Let Qt,^ be 
a fj-algebra generated by {Xt-}o<j<k, and abbreviate t and s for and tfc-i, respectively. 
To this end, we construct estimators of x^ and x^ on the condition of Gs, denoted by x^i^, 
and Xg|5, as follows: 

x^i, = exp(^(t-s))x,|, + A-i(exp(^(t-s))-/)b (13) 

Xi|, = {Xt\g,%y (14) 

^S\S — i-^s\sT^s\s) (15) 

Here note Xg^g belongs to Qg since Xg = Xg by the setting. Hence X^g = E[Xt\Qg] = 
E[Xt\J^g] since A and b belong to Gg <Z Tg. For y^|^, we construct an estimator of 1^ on 
the condition of Qt by, 

% = Yt\g + K{Xt-Xg) (16) 

. ^ ^ (17) 

Vi{Yg\g) 

where Vi and V2 are the (1,1) and (1,2) elements of V^i^. By the formula of (fTBj) and (fTUj) . 
Yt\g and Y^i^t are recursively updated after the initial state is given by yo|o = Yq. Thanks 
to the recursive formula, Yi\t G Qt for all t G {tfc}o<fc<iV- Actually, Yq\q is known. Suppose 
Yg\g G Qg. Then, % G Qg due to (dl. But, by (HH), y^i^ G g*. 

Here note Vt\g{Ys\s) which isn't necessarily equal to Vt\g{Yg) belongs to Qg since all the 
associated coefficients belong to Qg. And, these formula can be regarded as the prediction 
and filtering in the Kalman filtering if the system ©-(H]) is a conventional linear system. 

For practical purpose, we need to know the parameter vector, 77, and the nuisance 
parameter, 9. rj can be estimated by the least square estimation for example. As for 
6, we can take any nonzero number regardless of which the consistency discussed in the 
next section still holds thanks to theorem [TJ But, for numerical efficiency, we can take 
a quasi-maximum likelihood estimate obtained from maximizing the following likelihood 
function: 

n 

piXt„Xt„---,XtJ = p{Xt,)l[i27rHVt^it,_Ayt,.,\t,_jHT'^^ (18) 

k=l 

^ i^tk - H:s.t^\tk-if 1 



X exp ■ 
H = (1,0) 
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3 Consistency of the model 



In the first place, we preliminarily set up the followings: 

1. Observation: Suppose equidistant discrete times expressed by = to < < • • " < 
t]\j = T with At = T/N . Let time r be arbitrarily fixed. But, the discrete times 
necessarily traverse r; that is, tn = t for some n. The process Xt is observed at the 
discrete times and denote the observations by {Xtf,}o<k<N- 

2. Lipschitz condition: n{x\ri) is twice continuously differentiable with respect to x and 
T], and g{x) and \/ g{x) are twice continuously differentiable as well, /i and satisfy 
the Lipschitz conditions. That is, there is a constant L such that, 



3. Localization: First we assume Xt and Yt never explode in finite time and never 
reaches zero as well. Suppose a positive number M which is arbitrarily given. By 
using the stopping time T = Ai<j<3 Tj for Tj given below, we define stopped processes 
such as Xt = XtAT, Xt = XtAT, Yt = ItAT, and Yt = ftAT, where 



Note Xt and Yt are continuous processes, and thereby Tj's are all suitably defined as 
stopping time; see Karatzas and Shreve (1991), for example. By the above definition, 
we can assume Xt and Yt are bounded. And besides, Yj and y/ can be assumed to 
be bounded, too. Actually, fvom 1^, Yt = Ys + Y^^{Xt - X^) + {9 /2){Xt - Xsf . But, 
= + e{Xt- Xs), or y/ = Y^ + 9{Xt-Xo). Hence, y/ is bounded, and so is 
Yt. Here note Yg = g{Xs). 

By the localization, we first assume Xt, Yt, and Yt are all bounded, and thereby 
we prove the following theorems for the bounded processes. And then, by letting 
M ^ oo, we get the final result. 

4. Initial state: The initial states Xq, Yq, Yq, and Yq are given as constant. Particularly, 
we assume Yq = Yq = Yq^q. 

5. Asymptotics: We consider — > oo, At ^ and NAt oo simultaneously. Here, 
At = T/N. 

6. Consistent estimate tj: We assume we can consistently estimate r] with its rate of 
convergence {NAt)~^. We can take the least square estimation as such a method 
for example. See Prakasa Rao (1983). Another estimators of drift coefficients are 
known to have the same rate of convergence; see Florens-Zmirou (1989), Yoshida 
(1992) and Kessler (1997). 



\l^{x) - fi{y)\ < L\x-y 



(19) 
(20) 




T3 



inf{t >0;Xt>M or {X)t > M} 
inf{t >0;Yt>M}, 
mi{t > 0;!* < 0}. 



(21) 
(22) 
(23) 
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Under the above conditions, we want to show the model has the consistency in the sense 
that E\gt — Yt\t\'^ converges to zero as ^ oo, At — > and NAt oo. Instead of 
evaluating directly the measure, we do separately the distance between gt and Ij and 
between Yj and Yf. Here recall It evolve in time as follows: 

dXt = fi{Xt; ri)dt + ^dBt 
dYt = Yt'dXt + ^d{X)t 
dY^ = OdXt 

On the other hand, Yt follows the system: 

dXt = fi{Xt;fj)dt + ^tdBt 
dYt = Y}dXt + ^d{X)t 
dYt^ = edXt 

Since 

{gt-Yt^tf = {{g,-Yt) + {Yt-Yt) + {Yt-Yt^t)}\ 

< 3{{gt - Ytf + {Yt - Yt? + {Yt - %?}, 

we have only to show E\gt — Yt\'^, E\Yt — Ijp, and E\Yt — ^|tP converge to zero. Actually, 
we can show the following theorems. 

Theorem 1 For any t £ {tk}o<k<n, hmAt^o E\gt - ItP = 0. 

proof: Thanks to lemma [1] and [3] in the appendix, we can immediately show it. 
Theorem [1] implies hat Yj converges to gt independent of 6. So, theoretically, we don't have 
to care about its asymptotic properties as far as the consistency of the proposed model is 
concerned. 

Theorem 2 Let r be arbitrarily fixed. Suppose equidistant discrete times traversing t; 
that is, = to < ti < ■ ■ ■ < tn = T < ■ ■ ■ < tj\[ = T for some n. Then, E\Yr — Yr\'^ as 
N ^ oo. At ^ and NAt oo. 

proof: See the appendix. 

Theorem 3 In the same setting as theorem\^ E\Yr — Y^\t\'^ as N ^ oo, At ^ 
and NAt oo. 

proof: See the appendix. 
Then, we finally get. 

Theorem 4 In the same setting as theorem{^ E\gr — Y^\r\'^ — > as — > oo, At — > 
and NAt oo. 
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4 Numerical experiments 



First, we try to estimate curves of volatility functions by plotting tuples of Xt and Yi\i that 
are estimated by the proposed model. Section 3 guarantees the consistency of the proposed 
model, so we want to confirm this numerically by seeing how the estimates behave as the 
sampling interval goes to zero. 

Next, we compare the performance of the proposed model with the local linear model, 
or the local polynomial model when a linear function being fitted locally. According to 
Fan and Zhang (2003), the local linear model used here is briefly explained as follows. 

Taking m{x) for the volatility function and considering the neighborhood of xq, m{x) 
is locally approximated by m{x) « rh{x) = /?o + — xq), where the coefficients /?o and 
f3i are given by minimizing the object function, 

n 

Y^{Zl -Po- PiiXt,_, - xo)}'K,iXt^_^ - xo). 

k=l 

Here Z^^ = {Xt^ — Xtj._^)^/At and Kh{-) = K{-/h)/h. where At is the sampling interval, 
K{-) is a kernel function and /i is a bandwidth. We use the Epanechnikov kernel defined 
by K{u) = (3/4)(l — u'^)I{\u\ < 1) where /(•) is the indicator function. 

Throughout the numerical experiments, we consider the case in which fj, is linear since 
the least square estimation (LSE) produces the consistent estimate of r] with its rate 
of convergence {NAt)~2. Let fj,{x) = a + f3x where r] = (a, /?). Then, E[Xt\J-'t] = 
Xs + {a/ (3 + Xs){exp{l3{t — s)) — 1). So, we can get the estimates of a and /? by minimizing 
J2k=ii-^t — E[Xt\^t])'^ with respect to a and (3. On the other hand, we estimate the 
nuisance parameter 6 by using quasi-MLE with ()18p . 

4.1 Estimation of volatility process 

Here, we consider the following SDEs: 



dXt 


= (1 


- Xt)dt + ^ 


/XtdBt 


dXt 


= (1 


- Xt)dt + y 


fx^dBt 


dXt 


= (1 


- Xt)dt + ^ 


/xfdBt 


dXt 


= (1 


- Xt)dt + y 


IXteM-XhdBt 



with X starting at 1 and the total time span fixed at 1. Applying the two models to the 
above examples, we estimate volatility processes. 

Data are generated by the Euler method with data generating time interval l/1.28x 10^. 
On the other hand, observations are sampled out of them depending on sampling time 
interval At = 1/4,000, 1/8,000, and 1/16,000. We set the first 1/2 period as the burn-in 
time in order to avoid the influence of the starting value of X. And then, the subsequent 
1 period is used for estimation except that the first 1/40 period is used for estimating 
the initial state of Y by the sum of squared differences of X] the initial state is given as 
Y^^=i{Xsf, — Xs,,_^Y / At for {^sj.}o<A;<mi where m depends on At since the period for 
estimation of the initial state is fixed at 1/40. 
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From discrete time series given as above, we estimate a and P by LSE and 9 by the 
quasi-MLE, and /Sq and /?i by the least square estimation for the local linear model. In 
the local linear model, the bandwidth h is chosen by visual inspection. We take h = 0.15 
for the first example, 0.13 for the second, 0.12 for the third, and 0.10 for the last. Then, 
using their estimates, we compute (t G {tk}i<k<n) ^oi the proposed model and rh for 
the counterpart. 

Figure 1 through 4 display the results of estimation of the volatility processes. The left 
column shows the true curves of the volatility functions in a solid line and the estimated 
ones by the proposed model in a dotted line. The right column shows the true curves and 
the estimated ones by the local linear model in the same way as the left column. It can 
be easily seen that the estimated curves are converging to the true ones as At becomes 
shorter. Particularly, the convergence is pronounced in the proposed model. Comparing 
the estimated curves by the two models, the proposed model produces more smooth curves 
while somewhat wiggly for the local linear model. This wiggly curves might imply too 
small bandwidth, but the results are almost the same or worse in another choice. Anyway, 
we could confirm numerically the consistency of the proposed model that is proved in the 
previous section. 



4.2 Estimation in out-of-sample 

To evaluate the estimates given by the proposed model, we compare its performance with 
those by the local linear model in out-of-sample manner. To this end, we simulate 1,000 
sample paths with At = 1/16,000 while data generating time interval 1/3.2 x 10^. For 
every sample path, we use the first 2,000 data for the parameter estimation, and then, 
estimate the volatility states for the last 2,000 data. The estimation error is measured 
with the root mean squared errors (RMSE) based on the 2,000 states. Using the sample 
mean and standard deviation of 1,000 RMSEs, we compare the performance of the two 
models. 

Here we consider frequently used interest rate models as follows: 

dXt = {0.184: -0.2U6Xt)dt + 0.0783^/x'tdBt (lin) 

dXt = {0.0073 - 0.imXt)dt + 0.2596XtdBt (quad) 

dXt = {0.0408 - 0.5921Xt)dt + 1.2924X}-^dBt (cube) 

dXt = {0.007A-0.1180Xt)dt + 0.0713X^™^dBt (nlin) 

The parameters of the first and fourth examples are cited from Fan and Zhang (2003), 
the second ones from Takamizawa and Shoji (2004), and the third ones from Chan et al 
(1992). The first example has a linear volatility, the second is quadratic, the third is cubic 
and the last is nonlinear. 

Data are generated as the starting value Xq = 0.1. But, like the previous experiment, 
the first 2,000 data are discarded in order to get rid of the infiuence of choice of the starting 
value. The results are presented in Table 1. Except for (lin), the proposed model (semi) 
show better performance in mean than the local linear model (ker) . Particularly, looking 
at the standard deviations of RMSEs, the stable performance of the proposed model is 
pronounced. 
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4.3 Estimation of integrated volatility 



It is interesting to construct the integrated volatility from spot volatilities estimated in the 
previous section and to compare those with the realized volatility, R, given as X)fc=i — 
Here we use an approximation of the integrated volatility given as, 

•'^ k=i 
k=l 

where a stands for the estimate of the diffusion coefficient and s = to < ti < ■ ■ ■ < tn = t 
with At = tk — ijt-i- Let Vsemi and V^er be the approximate integrated volatilities com- 
puted from spot volatilities estimated by the proposed model and the local linear model, 
respectively. In the same setting as the previous section, we compute these integrated 
volatilities in out-of-sample manner. That is, Vgemi, Vker and R arc computed from the 
last 2,000 data. Then, we get the difference between R and Vgemi and between R and Vk^r 
as R — Vsemi and R— V^er for each sample path. The total differences are measure with the 
mean and standard deviation of differences for 1,000 sample paths. The results are pre- 
sented in Table 2. The total differences are almost the same between the proposed model 
and the local linear model. And, the two models underestimate the integrated volatility 
as compare with the realized volatility. Furthermore, looking at the standard deviations, 
the difference between Vsemi and Vker is quite small as compared with the estimation of 
the spot volatility. Considering the stable estimation by (semi), this maybe implies that 
the realized volatility is volatile enough to cancel out the difference of the two models. 



5 Concluding remarks 

The paper proposed a semiparametric model of estimating the volatility defined by the 
squared diffusion coefficient of a stochastic differential equation. The volatility was ap- 
proximated by a second order polynomial with stochastic coefficients and thereby we set up 
a vector process consisting of observable and unobservable processes in which the volatility 
process is defined as an unobservable one. By using the recursive updating formula, the 
volatility processes could be estimated by the filtering. 

From theoretical viewpoints, we presented the proof of consistency of the proposed 
model in the sense that estimated processes converge to the true ones as the sampling 
interval goes to zero while the total time span goes to infinity. 

And, from numerical viewpoints, we carried out the Monte Carlo experiments by 
which we could well estimate unobservable volatility processes and, at the same time, 
we confirmed the consistency numerically by using stochastic differential equations with 
linear /nonlinear diffusion coefficients. Furthermore, through the performance comparison 
with the local linear model, the propose model showed better performance of volatility 
estimation in mean and standard deviation of estimation errors than the local linear model. 
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6 Appendix for proofs 

In the following, Es[-] stands for £'[-|JF5]. 

Lemma 1 For s < t with At = t — s and any positive integer m, Es\Xt — X^p"* denoted 
by 0((At)"*). That is, there is a constant K^^ depending on m such that, 

Es\Xt-X,\''^' < K^iAtr, 

for sufficiently small At. 

proof: ^(•) stands for /u(-;r/) in this proof. We prove the lemma by induction on m. 
Firstly, consider m = 1. By the Ito's formula, 

{Xt - Xsf = 2 /* ^i{Xu){X^ - X,)du + /* Qudu 

J s J s 

+2 j\xu-Xs)^dB^ 

< j\li{X^f + [Xu - Xsf)du + f gudu 

J s J s 

+2 f\xu-Xs)^dB^ 

J s 

< f Lidu + j\xu - Xsfdu + 2 f\xu - Xs)^dB^ 

Js Js Js 

where L\ stands for some constant since X„ is bounded, and so are ijl{Xu) and g{Xu)- 
Applying the conditional expectation at time s, 

ft 



Es\Xt- XsY <LiAt + j Es\X^-XsYdu. 

By the Gronwall inequality, see Karatzas and Shreve (1991) for example, we get, 

Es\Xt-Xs\^ < LiAt+ f\i{u - s)e^-''du. 

The second term has the order of (At)"^. Actually, suppose limAt-.o /, (u-s)e*-'^d«/(Ai)2 
Since, 

i-t i-At 
J {u-s)e^^'''du = J ue^^-'^du 

we get, 

[!(u - s)e'-"d.u f.:^' ue^'-"du 

lim ^ — = lim ^ — -—75 

At-»o (At)2 At^Q (At)2 

= lim — 
At^o 2At 

The claim holds for m = 1. 
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Next, Suppose Es\Xt - X^p"*. By the Ito's formula, 
+2m [\x^-X,f^-'^dB^ 

J s 

< mj\ii{Xuf + \Xu - - 

+m(2m - 1) (\x^ - X.f^'^-^^g^du + 2m f\x^ - X^f^^'^^dB^ 

J S J S 

< 2m^Lrn j\xu - + 2m j\x^ - X,f "'du 

+2mj\Xu - X,f^-^^dB^ 

for some constant L2 such that < L2 and gu < L2 since Xg and gu are bounded. 

Hence, 

Es\Xt - Xsf"" < 2m^L2 f Es\Xu - X,p("»-i)cin + 2m f Es\Xu - Xs\^'^du 

J s J s 

By the induction, there is a constant K^-i such that, 

Es\Xt - Xsl'^"' < 2m^L2 /* Km-i(w - sy-^du + 2m f Es\Xu - Xs\'^"'du 

J s J s 

ft 

= 2m^L2Km-i{At)'^ + 2mJ Es\Xu - Xsf'^du 
By the Gronwall inequahty, 

nt 

Es\Xt - Xsf"^ < 2m'^L2Km-i{At)"' + 2mj 2m^L2Km-i{u - s)"^ e^'^^^-'^Uu 
The claim holds for m. This completes the proof. 

Lemma 2 For any k {1 < k < n), let t and s he and respectively. The order of 
Es\Xt-Xt\'' isO{At). 

proof: In this proof, /Lt(-) and /x'(-) stand for iJ,{-;r]) and iJ,'{-;r]), respectively. By the Ito's 
formula, 

d{X - Xf = 2{X - X)dX - 2{X - X)dX + d{X) + d{X) - 2d{X, X) 
= 2{X - X){dX - dX) + {^-\lffdt 

= 2{X - X){fx{X; r,) - jl{X; f{))dt + 2{X - X){^ - ^Jf)dB + (y^ - \/ffdt 

For simplicity, we may omit time subscription unless otherwise confusion. Here, consider 

the first order Taylor expansion of /i and fi. For fi, there exist i' € [s, t] such that ^{Xt) = 
n{Xs) + ji'{Xy){Xt — Xg). For /i, take the expansion around rj. So, jl{Xt; i)) = fi{Xt; rj) + 
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drjfl{Xt; 'r]){'fi—rj) for some rj, where drj stands for the gradient of jl. The rate of convergence 
of 77 is {NAt)~2. Here note r = nAt. Hence, 

fiiX;r^)-HX;fi) = f,{X;rj) - ^i{X;7j) - dr,ft{X;m - v) (24) 
= fiiXs) + fi'{X,)iX - X,) - {fiiXs) + l^'iXs){X - X,)} 

-d^il{X-fi){fj - ri) 
= fi'{X,){X -X) + ii^'iX,) - fi'iXMX - X,) - d^fiiX; f]){f, - r,) 

Here note Xg = Xg by the setting. Using this, 

d{X-Xf = 2^i'{Xs){X-Xfdt + 2{X-X){^i'{Xy)-ix'{Xs)){X-Xs)dt 

-2{x - x)dr,Kx- f)){f, + 2{x - x){^ - \/f)dB + {^- ^fdt 

fi' < L because of the Lipschitz condition and both g and Y are all bounded. So, for some 
Ki, fi' < Ki, g < Ki and Y < Ki. And note At < t. Then, 

E,\Xt-Xt\^ < 2Ki f Es\Xu-Xu\''du + AKi f Es\{Xu-Xu){Xu-Xs)\du 

J s J s 

+2 r Es\{Xu - Xu)dr,il{Xu; vXv - v)\du + 2KiAt 

J s 

First, by using lemma [H 

2 f Es\{Xu-Xu)iXu-Xs)\du < f Es\Xu-Xu?du+ f Es\Xu-Xs\'^du 

Js J s J s 

< f E,\Xu-Xu?du + K2{At)\ 

J s 

for some constant K2. Next, since Es\d^jl{X] fi){fj — < K3{NAt)^^ for some constant 

2 f' Es\{X^-Xu)dr,KXu;f]){v-v)\du < f Es\Xu - Xu\^du + f' Es\dr,fx{Xu;v){ri - 7])\^du 

Js Js Js 

< j'^Es\Xu-Xu?du + K^/N 

< f E s\Xu- Xu\^du + At. 



We get the last inequality from < T /N = At since we consider T 00. Hence, 
Es\Xt - Xt\^ < {AKi + 1) ^* Es\Xu - Xul^du + 2KiAt + 2KiK2{Atf + K^At 
< {AKi + 1) ^* Es\Xu - Xul^du + {2Ki + 2K1K2T + K3)At. 
By the Gronwall inequality, 

E,\Xt-Xt\^ < {2Ki+2KiK2T+K^)At+{AKi+l) j\2Ki+2KiK2T+K^){u-s)e^^^^+^'^^'^''Uu. 
This completes the proof. 
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Lemma 3 There is a positive constant K such that, 

{gt - Ytf < K{{Xt - Xsf + {Xt - Xsf) 
where t and s stand for t^ and tk-i, respectively. 

proof: The second order Taylor expansion of gt = g{Xt) around Xg is given as, 

gt = gs + g'{Xs){Xt - X.,) + ^^^{Xt - X,f 
where = (1 — rj)Xs + rjXt for some r] G [0, 1]. From ^ we have, 

Yt = Ys + Y}{Xt-Xs) + ^-{Xt-Xsf 
Ys = g{Xs) 

Since Xu is a bounded process, g'{Xu), g"{Xu) and Y^ are all bounded. Hence, 



{9t-Yt? = [{g'{Xs)-Y}){Xt-Xs) + \{g"{X,)-d){Xt-X, 



s? 



< 2 I {g'{X,) - Y} f{Xt - Xsf + ( kg"iX,) - 0))\xt - X^f 



< K{{Xt-Xsf + {Xt-Xsf) 
for some positive constant K. 

proof of theorem [2j /i(-) and /x'(-) stand for and /x'(-;r/), respectively. By the 

Ito's formula, 

d{Y-Yf = 2{Y -Y){dY -dY)+d{Y)+d{Y)-2d{Y,Y) 

= 2(y - Y){Y^dX - Y}dX + ^{9- Y)dt) + {Y^^ - Y}\lffdt 

Here, we denote processes at time s by Xg for example. Firstly, 

Y^dX - Y}dX = {Y^^l{X) - Y}fi{X; f]))dt + {Y^^ - Y}\lf)dB 
Using = Y} +e{Xt- Xs) and (HH), the coefficient of dt IS given as, 

yVW-nV(^;^) = Y}{^Ji{x)-^i{x■f,)) + e^,{x){x-Xs) 

= Y}{^i'{Xs){x -x) + (^'(x,) - ^l'{Xs)){x - Xs) - d^^mmv 

+e^{x){x-Xs) 
= y,V'(x,)(x -x) + {Y}{,,'{x,) - fi'iXs)) + Ofiixmx - Xs) 

-Y}d^fi{X;m-v) 
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Hence, 

d{Y-Yf = 2{Y-Y){Y}^^'{Xs){X-X) + {Y}{^,'{X,)-^^'{Xs)) + efl{X)){X-Xs) 

- Y}dr,fiiX; m - r?) + ^{g - Y)}dt 

+{Y^^-Y}^fdt 

+2{Y -Y){Y^^-Y}^)dB 

< (y - Yfdt 

+{nV'(^.)(^ -x) + {Y}{^i'{x,) - + eiJi{x)){x - X,) 

- Y}dr,fi{X- m - r?) + ^(g - Y)}^dt 

+ {Y^^-Y}\lffdt 

+2{Y -Y){Y^^-Y}\lf)dB 

< (Y - Yfdt 

+4{(y,V'(^s))'(x - xf + (Y^i^'ix,) - ^'(x,)) + 0Kx)fix - x,f 



+ {Y}drr^{X- m - + -{g - Yf]dt 



?2 



+{Y^^-Y}^fdt 
+2{Y -Y){Y^^-Y}\lf)dB 

Y} , jjL, and /x' are all bounded. And, the rate of convergence of fj is (A^At)~2. Hence, 
without loss of generality, there is a constant Ki such that, 

d{Y-Yf < {Y -Yfdt + Ki{{X - Xf + {X - Xsf + {g -Yf + l/{NAt))dt 
+{Y^^-Y}^fdt 
+2{Y -Y){Y^^-Y}\/f)dB 

Firstly, 

{gt-Ytf = {{gt-Yt) + {Yt-Yt)f 
< 2{{gt-Ytf + {Yt-Ytf) 

Thanks to lemma [3l there is a positive constant K2 such that, 

{gt - Ytf < K2{{Xt - Xsf + {Xt - Xsf) 

Next, we want to evaluate the coefficient of dt in the second line. 

{Y^^t-Y}^tf = {Y}{^t-^t) + e^t{Xt-x;)f 

< 2{{Y}{^,-^t)f + {e^t{Xt-Xs)f} 
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Since X is a bounded process, and g are also bounded. Hence, there is a positive 
constant Ks such that. 



{Y'^t - Y^^jYtf < Ksii^t - ^jYtf + {Xt - Xsf) 

Furthermore 

{V9t-^t? = {{V9t-^/9's) + {V9's-^t) + {^t-\f^t)? 

Because of the Lipschitz condition of y^, 

{^t-V9's?<L\Xt-Xsf 

To evaluate the second and third terms, we introduce stopping times for a sufficiently 
small e > as follows: 

T| = inf{t > 0;Yt < e}. 

As for Yt, since its sample path isn't necessarily continuous, we firstly define a stopping 
time Sk for t G \tk-i,tk) as follows: 

Sk = ini{t>tk-i;Yt<e}. 

And then, a stopping time S is defined by, 

g ^{ Sk t e [tk-i,tk) {1 <k<n) 

\ r t>tn = T 

Using these stopping times, we newly redefine Yt and Yt as Yt = YtATiAS and Yt = 
YtATiAS, respectively. 



Similarly, 



< 



< 



{Yt - Ysf 

Yt + Ys 
{Yt-Ysf 
2e 



{^Yt - ^Yt 



< 



< 



Yt-Yt 



^t + yYt, 

(Yt - Ytf 

Yt + Yt 
{Yt - Ytf 
2e 
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Using the above inequalities, we get, 

d{Y-Yf < Ki{X - Xfdt + Ki{X - Xsfdt + K^{X - Xs)'^dt + K(i{Y -Yfdt 
+Ki/iNAt)dt + 2(y - Y){Y^^-Y}\/¥)dB 

where, 



K5 = 2K1K2 + 



2 . 3i^2. 



2e 

3K2K3 



Kq = I + 2K1 + 



2e 
2e 



Hence, 



Es\Yt-Yt\^ -Es\Ys-Ys\^ = E,\Yt-Yt\^ -\Ys-Ys\^ 

< Ki f Es\Xu- Xu?du + Ki f Es\Xu-X,\^du 

J s J s 

+K5 f Es\Xu - Xs\^du + iTe /* Es\Yu - Yj^'du + l/N. 

J s J s 

Thanks to lemma [1] and [2l the integrands of the first and second terms have the order of 
(n — s) and the third (n — s)^. Consequently their integrations have the order of (At)^ 
and (At)'^. The resulting inequality is give by, 

Es\Yt-Yt\^ < \Ys-Ys\^ + 4>{t)+KeJ^Es\Y^-Yu\^du 

where (p{t) consists of the two parts; one has the order of and the othre has the order 
of (At)^. By the Gronwall inequality, we get, 

E,\Yt-Yt\^ < \Ys-Ys\^ + m + Kej\\Ys-Ys\^ + (l){u))e^<^'^'-''Uu 

= \Ys - Ysl^e^'^' + ^{t) + Ke ^* 0(n)e^«(*-")du 

Note that the summation of the second and third terms, denote by ip, can be expressed by 
<piil/N) + (/>2((At)2) + (j)3{At/N, (Ai)3), where 0i has the order of (p2 has the {Atf 
order, and (ps has the sum of At/N and {At)^. Applying the unconditional expectation 
to the both sides and substituting t and s for t^, and we get, 

E\Yt^ - Ytf < e^'«^*i^|y,,_, - Yt,_f + ^ 

Multiplying the both sides by ei^-^)K6^t and summing it up from A; = 1 to n, 
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But, lo = ^0 by the setting. Hence 



At e^6At_i 



-1) 



On one hand, since nAt = r, 



Urn A V „ , , ^, 



1) 



l)At 



At™o — 1 



On the other hand, V/At can be expressed by ^i{l/{NAt)) + 02(Ai) + ^3{1/N, (Atf). 
Hence E\Yt — Yt\'^ converges to zero as N ^ oo, At ^ and TV" At — oo. Finally, by 
letting e J, 0, we can get the desired result. 

Lemma 4 Let Vi{x) and V2{x) be the (1,1) and (1,2) elements ofVi\g{x). For any e > 0, 
there exists a 5 > Q such that for any At < 5, 



V2{x) V2{y) 



Vi{x) Vi{y) 



< eAt 



proof: Now suppose Vi{x) {i = 1,2) is a function of At, which is denoted by Vi{At;x). 
Then, we want to evaluate 

V2{At;x) V2{At;y) 

hm — 

At^o vi{At; X) vi{At\y) 

Here note \\Ta.^t^QVi[At;x) = and \\Ta.j\t~*Q v[{At; x) = x. For simplicity, we denote 
Vi{At;x) and Vi{At\y) by Xi and y^, respectively. 



X2 y2 

lim 

At^o xi yi 



lim 

At^O 



lim 
At-^0 



X2yi - Xiy2 

xiyi 

{x'jyi + '2.x'2y'i + X2y'i) - {x'{y2 + 2x'^y'2 + xiy'^) 
x'{yi + 2x'-^y[ + xiy'l 



= 



On the other hand. 



lim — ^ = lim — 
At->o At At->o 1 
= X 



lim — i 

At^O At 







Hence, 

In other words, for any e > 0, there exists a 5 > such that, for all At < 5, 

2 

< e 



xi 
At 



xi yi 



This completes the proof. 



20 



Lemma 5 LetVi{x) andV2{x) be as above. AndletV^^x) be the (2,2) elements ofVt\s{x). 
For any e > 0, there exists a (5 > such that for all At < 5, 



Vi{x)V3{x)-V2{xf 



Vi{x) 



< eAt 



proof: Now suppose Vi{x) (1 < i < 3) are function of At, which is simply denoted by x,. 
And note hniAt^o Xi = 0, hniAt^o = ^ ^^id limAt^o < oo- 



hm 



X1X3 — X2 



At^o xiAt 



hm 

At^O 



{x'lxs + 2x^4 + xix'^) - 2{{x'2f + 3:240 
x'lAt + 2x[ 



Hence, for any e > 0, there exists a 5 > such that, for all At < 6, 



X1X3 - x^ 



xiAt 



< e 



This completes the proof. 

proof of theorem [3j For simplicity, let t and s be tk and tk-i, respectively. And, let 
Vi{x), V2ix) and V3(x) be the (1,1), (1,2) and (2,2) elements of 14|s(x). Recall, 

where. 



Hence, 



= Es[{Yt - %f] - 2KEs[{Yt - %){Xt - ^t,,)]2 + K^Es[{Xt - Xt\,f] 

Here note k G Gs- 

Firstly, we evaluate the first term. Noticing Gg C J^s, 

E^iiYt-Yt^gf] = EMYt-Es[Yt]) + iEs[Yt]-Yt^,)}^] 

= Es[{Yt - Es[Yt]f] + Es[{Yt - Es[Yt])Es[Yt - y,,,]] + E,[{E,[Yt] - 1>|,) 
= Vs{Ys)+e^'^'^\Ys-Z\sf 

Secondly, since Xt\s = Es[Xt], 

Es[{Yt-Yt\,){Xt-Xt\,)] = E,[{Yt - Es[Yt]){Xt - E,[Xt])] 

+Es[Es[Yt -%]{Xt - Es[Xt])] 
= V2{Ys) 
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Hence, 

t\tn = V3{ls) ± e-— {Is - ^s\s 



Vi{Ys\s) \Vi{Ys\s). 



^ ViiYs)V3(Ys) - V2{Ys)^ 
Vi{Ys) 

Due to lemma m and O for any e > 0, there exits a 5 > such that for all At < 6, 

2 



v,iYs)(m-^] 

'Wys) v^{Ys\,)J 

Vi{Ys)V3{Ys) - V2{Ys)' 



ViiYs, 



< eAt 



< eAt 



Hence, 

Es[{Yt - Yt\tf] < e2^^*(n - y,|,)2 + 2eAt 
Applying the unconditional expectation, we get, 

E[{Yt - < e''^'E[{Ys - Ys\s?] + 2eAt 

Recall t and s stand for tk and tfc-ii respectively. By multiplying e^^'^*^""^) by the both 
side and summing it up from /c = 1 to n, we get, 

p2cnAt _ 1 

< e2'=^*("-i)S[(yo-^'o|o)'] + 2eAr 
= 2eAt 



2cAt _ I 

g2cAt _ I 

Here note Yq = 1^|q by the setting, tn = t and nAt = r. By At going to zero, 

„2CT _ 1 

hm E[{Yr - Y,\r?] < e 

Since e is arbitrarily given, this completes the proof. 
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lin 


quad 


cube 


nlin 


semi 


mean 


1.0083 


0.9454 


2.3153 


0.2597 




std 


1.0397 


0.9621 


2.6087 


0.2544 


ker 


mean 


0.9747 


1.0691 


2.6172 


0.2828 




std 


1.1800 


1.3238 


3.7640 


0.3272 



Table 1: Means (mean) and standard deviations (std) of 1,000 RMSE's of the proposed 
model (semi) and the local linear model (ker) are presented. Actual values should be 
multiplied by 10~^. 
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lin 


quad 


cube 


nlin 


R- 




mean 


5.3036 


5.7694 


14.3737 


1.5188 






std 


0.6111 


1.3237 


8.3288 


0.1456 


R- 


^ker 


mean 


5.3039 


5.7744 


14.3431 


1.5208 






std 


0.6096 


1.3240 


8.2681 


0.1452 



Table 2: Means (mean) and standard deviations (std) of the differences for 1,000 sample 
paths are presented. Actual values should be multiplied by 10~^. 
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